Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection

نویسندگان

چکیده

Multi-modal 3D object detection has been an active research topic in autonomous driving. Nevertheless, it is non-trivial to explore the cross-modal feature fusion between sparse points and dense 2D pixels. Recent approaches either fuse image features with point cloud that are projected onto plane or combine These often suffer from severe information loss, thus causing sub-optimal performance. To address these problems, we construct homogeneous structure images avoid projective loss by transforming camera into LiDAR space. In this paper, propose a multi-modal interaction method (HMFI) for detection. Specifically, first design voxel lifter module (IVLM) lift space generate features. Then, voxelized different regions introducing self-attention based query mechanism (QFM). Next, (VFIM) enforce consistency of semantic identical objects representations, which can provide object-level alignment guidance strengthen discriminative ability complex backgrounds. We conduct extensive experiments on KITTI Waymo Open Dataset, proposed HMFI achieves better performance compared state-of-the-art methods. Particularly, cyclist benchmark, surpasses all published algorithms large margin.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi Sensor Fusion for Object Detection Using Generalized Feature Models

This paper presents a multi sensor tracking system and introduces the use of new generalized feature models. To detect and recognize objects as selfcontained parts of the real world with two or more sensors of the same or of several types requires on the one hand fusion methods suitable for combining the data coming from the set of sensors in an optimal manner. This is realized by a sensor fusi...

متن کامل

Multi-modal Interaction for 3d Modeling

The actual usability of a Virtual Environment (VE) depends to a large extent on the multi-modal interaction interfaces. A multi-modal interaction system combines visual information with many interaction methods to provide flexible and powerful dialogue approaches, thus enabling users to choose single or multiple interactions. This paper is a review of existing multi-modal interaction interfaces...

متن کامل

Improving 3D perception for Object Detection, Classification and Localization using Fused Multi-modal Sensors

Object perception in 3-D is a highly challenging problem in computer vision. The major concern in these tasks involves object occlusion, different object poses, appearance and limited perception of the environment by individual sensors in terms of range measurements. In this particular project, our goal is improving 3D perception of the environment by using fusion from lidars and cameras with f...

متن کامل

Predicting Depression Severity by Multi-Modal Feature Engineering and Fusion

We present our preliminary work to determine if patient’s vocal acoustic, linguistic, and facial patterns could predict clinical ratings of depression severity, namely Patient Health Questionnaire depression scale (PHQ-8). We proposed a multi-modal fusion model that combines three different modalities: audio, video, and text features. By training over the AVEC2017 dataset, our proposed model ou...

متن کامل

Hybridization of Facial Features and Use of Multi Modal Information for 3D Face Recognition

Despite of achieving good performance in controlled environment, the conventional 3D face recognition systems still encounter problems in handling the large variations in lighting conditions, facial expression and head pose The humans use the hybrid approach to recognize faces and therefore in this proposed method the human face recognition ability is incorporated by combining global and local ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-19839-7_40